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The multi-ported memories (MPMs) are essential and are part of the parallel 
computing system for high-performance features. The MPMs are commonly 
used in most processors and advanced system-on-chip (SoC) for faster 
computation and high-speed processing. In this manuscript, efficient MPMs 
are designed using the integration of hierarchical bank division with xor 
(HBDX) and bank division with remap table (BDRT) approaches. The 
BDRT approach is configured using remap table with a hash write 
controlling mechanism to avoid write conflicts. The different multiple read 
ports are designed using BDX, and HBDX approaches are discussed in 
detail. The results of 2W4R and 3W4R memory modules are analyzed in 
detail concerning chip area, operating frequency (MHz), block random 
access memories (BRAMs), and throughput (Gbps) for different memory 
depths on virtex-7 field programmable gate array (FPGA). The 2W4R 
utilizes 2.27% slices, operates at 268 MHz frequency by consuming 64 


BRAMs for 16K memory depth. Similarly, the 3W4R uses 2.28% slices, 
operates at 250 MHz frequency by consuming 96 BRAMs for 16K Memory 
depth. The proposed designs are compared with existing MPM approaches 
with better chip utilization (Slices), frequency, and BRAMs on the same 
FPGA device. 
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1. INTRODUCTION 

Multi-ported memory (MPM) modules are required in parallel computing systems or processors to 
obtain high performance. The parallel computational system uses the high-end register files along with 
shared-memory architectures to run the operations. Hence the multiple-parallel write and read ports and high- 
bandwidth memory modules are required to run the high-end parallel system [1]. There are many systems 
like vector processors, digital signal processing (DSP) system, chip-multi-processors (CMPs), very long 
instruction word (VLIW) based processors that rely on MPMs for parallel computations and access. The 
MPM module is designed using registers and logic elements. But it is feasible and applicable to the low-end 
memory modules. So, alter the static random-access memory (SRAM) bit cell in MPM to access more ports, 
which gives more area utilizations as the number of reads and write increases. So, field programmable gate 
array (FPGA) plays an essential role in customizing the area resources with minimal effort. The FPGA 
devices are used to build complex applications capable of producing high system speed rather than 
Application-specific integrated circuits (ASICs). The implementation of MPMs on the ASIC environment 
performs multiple writes and read operations simultaneously and avoids contention and serialization. The 
FPGA-based system uses a simple instruction pipeline mechanism and is less affected by instruction-level 
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parallelism (ILP). The advanced FPGAs support only dual-ported RAMs. The user has to convert these to 
multi-ported memory modules using additional logic elements and block RAMs. The limitations of ILP are 
analyzed in many existing works to improve the processor system's performance. In real-time, ILP usage on 
FPGA for MPM is tiny and ineffective [2], [3]. 

Many approaches are available to design the MPMs in FPGA, including pure logic elements (LE) 
based, replication approach, banking approach, and multi-pumping approaches. The performance of the 
MPMs is analyzed using usage of LEs, BRAM usage, maximum operating frequency (Fmax), latency, and 
throughput [4]. The scalability feature is improved by providing the number of write/read ports and 
addressable memory space options. The latency and throughput parameters will be improved by utilizing 
proper unsegmented address space in MPMs [5]. The reliability and portability features will be enhanced by 
integrating the first-in-first-out (FIFOs) with multi-ports on FPGA [6]. The multi-port shared memory 
architecture is used to access the parallel data, providing high performance and being suitable for multi-core 
architectures. High-speed communication will be established only using advanced bus architecture, network 
on chip (NoC) with shared multi-ported memories [7]. The ternary content addressable memory (TCAM) is 
used in most high-speed applications but lags with higher power consumption on FPGA. So TCAM with 
multi-ported static random-access memory (SRAM) and multi-pumping method minimize resource 
utilization and improve FPGA power [8]. Even though multi-ported SRAM with pseudo features is used 
extensively in imaging processing applications, especially display drivers [9]. The multi-match packet 
classification is easily achieved with minimal soft errors using TCAM on FPGA [10]. The high bandwidth 
memory (HBM) offers parallelism and pipelining solutions with better throughput to most complex systems 
like neural networks based on a neuromorphic system in real-time scenarios [11]. The MPMs are used in 
many advanced applications like image processing, radar applications, real-time processing applications, data 
acquisition system, and multi-processing systems on chip (MPSoC) applications [12]-[14]. 

In this manuscript, efficient hybrid-mode multi-ported memory modules on the FPGA platform are 
discussed. The contribution of the work is as: i) The proposed MPMs provide a cost-effective design solution 
with high system performance for a complex system, ii) The BDRT approach is modified by using remap 
table with a hash write controller to avoid the write conflicts and (iii) The performance of proposed designs is 
discussed in detail and compared with existing XOR, live valve table (LVT), and BDRT approaches with 
better resources improvements. Section | discusses the related works on multi-ported memory modules using 
software and hardware approaches and its findings. The multi-ported memory module using HBDX and 
BDRT approaches are discussed in detail in section 2. The results and analysis of the proposed design with 
different memory depths are discussed in detail in section 3. Finally, it concludes the overall work with 
improvements in section 4. 

The existing works related to the MPM modules with different approaches using both software and 
hardware environments are discussed in this section. Muddebihal and Purdy [15] present the multi-ported 
memory modules with area-efficient FPGA environment features. The work uses a better hardware 
mechanism for writing conflict resolution and detection to save BRAM and chip resources. The work lags 
with frequency features on the FPGA platform and needs to investigate the design on higher-order multi-port 
banks. Lai and Lin [16] discuss the multi-ported memories on the FPGA platform with an efficient design 
approach. The method uses a hierarchical approach to save BRAM and saves up to 69% BRAM for 2W4R 
design than the LVT approach. Abdelhadi and Lemieux [17] present the multi-ported memory compiler 
module using true dual-port BRAMs on FPGA. The work uses multiple-switched ports for multiple-ported 
RAM design with optimization. The result is realized with different test cases and improves the BRAM and 
LE than other approaches. Strollo and Trifiletti [7] present the configurable shared memory architecture for 
the multiprocessor system with parametrized features. The design reduces the interconnections among 
processors by sharing the memory locations. 

Lai and Huang [18] present the algorithmic multi-ported memory with hierarchical banking 
architecture on the FPGA platform. The non-table-based MPM is designed using banking architecture to 
reduce the BRAMs. The different MPMs are realized with varying depths of memory and represent the 
various hardware resources. Shahrouzi and Perera [19] discuss the memory architecture using FPGA for 
computer applications. The designed MPM addresses many issues like on-chip area, BRAM utilization, and 
feasibility for intense computer applications. The memory architecture is built with memory banks, encoder, 
decoder, and multiplexors modules. The work consumes more clock cycles to produce the read output on the 
memory module. Patil and Musle [20] present the MPM with fault detection and repair feature using built-in 
self-test (BIST) on the FPGA platform. The synchronous and asynchronous-based micro-code BIST 
(MBIST) is used to test the MPM modules. The work analyses only the simulation results to verify the design 
is working functionally. 

Manivannan and Srinivasan [21] present the 2W4R multi-port register file using pulsed latches. The 
work offers low-power consumption for 2W4R architecture compared to (SRAM) based register file module. 
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The design work is limited to 8-bit depth for designing the 2W4R multi-port register file. Humenniy et al. 
[22] discuss the shared access memory architecture for data transmission applications. The design uses a 
software-based approach for shared memory design based on the residue number system with data protection. 
Ullah et al. [23] present the ternary content-addressable memory (TCAM) based multi-ported SRAM module 
with multi-pumping features on FPGA. The work offers 2.85 times better performance than existing 
approaches per memory module. But the TCAM based process consumes more BRAMs and hardware 
resources on FPGA. Navagiri and Muthumanickam [24] elaborate the LVT found MPM module on FPGA. 
The design uses conventional approaches, consuming more BRAM and other hardware resources than XOR 
and advanced hybrid approaches. Shahrouzi and Perera [25] present the MPM architectures using the 
optimized counter mechanism for next-generation advanced FPGAs. The work analyses the routing 
complexity of MPM using complex and most straightforward architectures. Different MPM architectures like 
an encoded parallel model, direct data forwarding, encoded forwarding and binary coded forwarding modules 
are designed using the optimized counter mechanism. Chen et al. [26] present algorithm-based MPM using 
an efficient writing mechanism. The works use remap table-based architecture for multiple writes -ports. 
Shahrouzi et al. [27] discuss the MPMs with composing bi-directional features for advanced FPGAs. The 
work analyses the uni-directional MPM and bi-directional MPM using the decision-making module (DMM). 
The work is compared with other conventional approaches with better improvement in BRMAs and other 
hardware resources. Zhang et al. [28] present the XOR-based MPM with high throughput feature using a 
parallel hash table. The work analyzes the different performance metrics both on Xilinx U250 FPGA and 
Intel devices. 


2. MULTI-PORTED MEMORY MODULE 

This section explains the detailed architecture of MPM for both write and read ports approaches. 
The multiple read ports are designed using Bank division with XOR BDX, and hierarchical BDX (HBDX) 
approaches. The numerous write ports are created using the Bank division with Remap table BDRT 
approach. The integration of multiple writes and read ports is used to construct the MPM using BDX, and 
BDRT approaches. 


2.1. Multiple read port approaches 

In this work, different read port designs are designed, namely, 1W2R module, 1W2R mode, 4R 
mode, and HBDX based 1W4R module. The BDX increases the read ports rather than the write port using 
the conventional XOR approach. The different single-port RAM (1W1R) modules are used in the XOR 
approach to increase the write ports. In contrast, BDX approaches use many single-port RAMs to increase the 
read ports. The BDX based 1W2R design example is illustrated in Figure 1. It mainly contains five memory 
modules: four memory banks (MB) and one XOR bank (XB). The Four MB are represented as MBo, MBi, 
MBz2, and MB3. Each MB can perform one write, and two read operations parallel. Each MB supports up to 
N-1 memory depth and stores the data to the corresponding address locations where 'N' denotes the 
maximum value of memory depth in MB. 
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Figure 1. BDX based 1 W2R design example 
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In this 1W2R example, the write (Wo) value is stored in MB, of the 0" location (WD (,0)). Whereas 
the read the Ro and R; data simultaneously from MB, of 0 and 1“ locations initially. The XB performs 
individual XOR operations for all the data values from each MB and is updated in corresponding areas., The 
XB is represented for the last location (N-1) updation in (1). 


Xw-1) = WDon-1) 8 WDan-1) B WD2,n-1) B WD 3,n-1) (1) 
After XB Updation, the Ro and R: will also be updated as (2) and (3). 


Ro = WP ao) (2) 
R= WDoo,1) ® WDe2,1) ® WDe3,1) @ X, (3) 


The Ro is obtained directly from MB, of the 0" location, whereas Ri cannot access it now due to 
write data conflicts in MB;. The R; recovers from data value by performing the XOR operation using MBo, 
MB> MBs, and XB of 1* locations. To access the data from MB3 for any location, the general is represented 
as (4). 


WDe3,n-1) = WDon-1) 8B WDaw-1) B WD2,n-1) B Xn-1 (4) 


To access the data from 'n' read using the 1W2R approach is difficult due to duplication and write 
conflicts mechanism. This causes more area overhead and affects the system performance on FPGA. So 
HBDxX approach is used to these issues with better performance with minor chip overhead. 

The two-mode approach is designed using either 1W2R or 4R modules, and it is used further as a 
hybrid approach for the implementation of HBDX design. The 1!W2R mode and 4R mode using BDX are 
illustrated in Figures 2(a) and 2(b), respectively. Figure 2(a) shows that the 1W2R mode works the same as 
1W2R with few additional mechanisms. The Ro reads the data directly from MBo of the 0" location. The Ri 
reads the data from any banks like MB, MB2, MB3, and XB with the exact locations. The Read update (Ru) 
updates the XOR bank by reading the corresponding memory locations. Figure 2(b) illustrates the 4R mode, 
which does not have to write requests or data and performs only four read operations. The RO and R2 are 
accessed simultaneously from MBo, whereas R; and R3 will access the data from any banks like MB:, MBo, 
MBs, and XB. 


MBo 
MBo Woy 
WDon-1)® © © © © |WDo1| WD.) Ro Rz2 <4 WDon-1) @ © @& © © (WDo1)| WDoo}—> Ro 
MB, MB, 
Ru <] WDian-1) ¢ © © © © [WDoa1)|WDao)—» Ri R3 <4WDun-1 ¢ © @ © © (WDa.1))WDao |» Ri 
MB, MB, 
Ru <4 WDen-1)% © © © © WDe1)| WD Ri R3 <> WDen1) © & © © © WDe1)/WDe0)/—» Ri 
MB; MB; 
Ry <4] WDen-1) © & © © © [WDe1) WDe0) > Ri R3 <4 WDen-1)@ © & © © WDe1)/WDe0) |} Ri 
XB XB 
Xn1 |@ © @ @ @/| X, X rR R3 <4 Xni |@ © @ @ e@| X, X rR 
(a) (b) 


Figure 2. BDX based 1W2R/4R Mode design example (a) 1!W2R mode and (b) 4R mode 


The duplication and write conflicts are overcome using the HBDX approach using the 1W2R/ 4R 
mode approach. The HBDX based 1 W4R design example as a worst-case scenario is illustrated in Figure 3. 
The MBo is considered 1W2R mode, and other banks like MB;, MB2, MB; and XB are considered 4R mode. 
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The data (Wo) is written in MBO and reads Ro and R2 simultaneously with corresponding locations. In 
contrast, the banks like MB;, MB», MB3, and XB reads the related memory locations by performing XOR 
operation to generate the rest of the read outputs like R2 and R3. The corresponding updated read values from 
the banks are XORed to create the final read update (Ru) value. 
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Figure 3. HBDX based 1 W4R design example 


2.2. Multiple write port approach 

The BDRT approach is used to increase the write ports was initially proposed by Lai et al. [12]. It is 
similar to the conventional approaches like LVT and XOR with few modifications. The BDRT approach 
supports multiple write data by using remap table and extra BRAMs and avoiding replication mechanism. 
The remap table is used to find the memory location of the updated write data in the memory module. The 
BDRT based 2W1R memory module example is illustrated in Figure 4. It mainly contains one remap table, 
two memory banks (MBo and MB), and one bank buffer (BB). In BB, null entries are stored temporarily, 
which doesn't contain any valid user data. The written requests WO and W1 are first analyzed in the remap 
table to identify the correct memory and its location to store the reported data. Based on the remap table 
mechanism, the Wo and W, requests go to the bank MBo of 0" and 1** memory address locations. But the MB 
can store one write at a time. So Wo request is updated in 0“ memory location, and W; is stored temporarily 
in BB of 1* location as a null entry with offset 1. Once the remap table is updated, as shown in Figure 4(b), 
1“ memory location is updated by the remap table, whose identification number is 2 in BB. The BB stores 
W; data in 1*' memory location based on the updated remap table in the final stage. The BDRT approach uses 
less chip area and BRAM the conventional LVT approach, but it needs additional registers to design the 
remap table. 


Remap Table Remap Table 
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+ Wo Wo 
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Null |e e eee; Null Null Null Null |e e eee; Null | WDo1) Null 
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Figure 4. BDRT based 2W1R memory module example (a) Initialization stage and (b) Final stage 
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2.3. Integrated MPM module 

The integration of BDRT and HBDX for writing and read ports, respectively, for the nWmR 
memory module is illustrated in Figure 5. It mainly contains BDRT and HBDX modules, where the memory 
module is divided into 'd’ data banks. 


MBo 
mR1IW 
> Ro 
e 
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Radar ——} MBa.1 us 
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era Hash Write 
ie -——», Controlling BBo 
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Wadar — > od 
BBy2 e Ry 
> mR1IW 


Figure 5. nWmR memory implementation using BDRT with HBDX approach 


The BDRT approach provides all the necessary written data using remap table to the HBDX 
module. The BDRT requires n-1 BBs for write port updation. Additionally, the hash writes control 
mechanism is incorporated in the BDRT approach for the proper distribution of write data to the 
corresponding banks in the HBDX module. The 2W4R memory module using BDRT with the HBDX 
approach is illustrated in Figure 6. For 2W4R implementation, four memory banks and two bank buffers are 
required. The MBs and BBs are designed using 1!W2R or 1W2R mode or 4R mode memory module. In this 
work, the 1W2R memory module is considered to avoid write conflict for MBs and BBs design. The four 
MBs (1W2R memory module) and two BBs (1W2R memory module) receive the write data and generate the 
corresponding read data based on the HBDX approach. The remap table provides the multiplexors' select line 
to generate the final four outputs (Ro, Ri, Ro, and R3). For 3W4R memory implementation, only four MBs are 
used, and three BBs are considered. Only BBs can be increased to solve the write conflicts in the 3W4R 
memory module. 
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| : 
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Figure 6. 2W4R memory module using BDRT with HBDX approach 
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3. RESULTS AND DISCUSSION 

This section explains detailed implementation results of MPMs on Virtex-7 FPGA. The Results of 
multiple read ports and integration of HBDX with the BDRT approach for 2W4R and 3W4R memory 
modules are discussed. For all the MPMs designs, the standard data width of 32-bit is considered. The 
performance parameters like chip area utilization (Slices and LUTs), maximum obtained frequency (Fmax) in 
MHz, BRAMs, and throughput (Gbps) are analyzed by concerning different memory depths are presented for 
2W4R and 3W4R memory designs. The Virtex-7 FPGA with XC7V585T device is considered for 
implementation with a package of FFG1761 [29]. The virtex-7 FPGA has enormous resources and supports 
high-performance features for complex designs in real-time scenarios. The virtex-7 FPGA mainly contains 
582K logic cells, 92K of slices, 729K of configurable logic blocks (CLBs), 795 of BRAMs, 126000 of DSP 
slices, 850 of Input-Output blocks (IOBs), and 6938Kb of distributed RAMs. The performance analysis of 
multiple-read ports using the BDX approach on virtex -7 FPGA is tabulated in Table 1. The multiple read 
ports like 1W2R, 1W2R mode, 4R using BDX approaches, and 1W4R using HBDX approach memory 
modules are implemented. The multiple read ports designs using 8K and 16K memory depth are realized in 
this work. The chip area utilization of multiple-read ports on virtex-7 FPGA is illustrated in Figure 7. 


Table 1. Performance analysis of multiple-read ports using BDX approach on Virtex -7 FPGA 


Resources BDX_1W2R BDX_1W2R Mode BDX_4R Mode HBDX_1W4R 
8K 16K 8K 16K 8K 16K 8K 16K 
Slices 96 96 96 96 81 41 459 405 
LUTs 96 96 65 65 80 40 241 185 
BRAMs 16 32 16 32 16 16 32 48 


Fmax (MHz) _ 1759.01 1759.01 1463.05 1463.05 1266.22 1272.14 1223.54 1223.54 


The graphical representation of the BRAM utilization and obtained frequency (Fmax) for multiple 
read ports is shown in Figure 8(a) and Figure 8(b), respectively. The 1!W2R module utilizes only 96 slices 
and 96 LUTs for both 8K and 16K memory depth on virtex-7. The 1W2R uses 16 and 32 BRAMs for 8K and 
16K, respectively, and operates 1759 MHz. The 1W2R mode module utilizes only 96 slices for both 8K and 
16K memory depth and works with a frequency of 1463.05 MHz on virtex-7. The 1W2R mode module 
utilizes 16 and 32 BRAMs for 8K and 16K memory depth. The 4R mode module using BDX utilizes only 81 
slices and 41 slices for both 8K and 16K memory depth, respectively. The 4R mode module utilizes only 16 
BRAMs for 8K and 16K memory depth, respectively. The 4R mode module operates with 1266.22 MHz and 
1272.14 MHz for 8K and 16K memory depth, respectively, on Virtex-7 FPGA. The 1W4R module using the 
HBDxX approach utilizes only 459 slices and 405 slices for both 8K and 16K memory depth and operates 
with a frequency of 1223.54 MHz on virtex-7. The 1W4R module using the HBDX approach utilizes 32 and 
48 BRAMs for both 8K and 16K memory depth, respectively. 
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Figure 7. Chip area utilization of multiple-read ports on virtex-7 FPGA 
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Figure 8. BRAM and frequency parameter analysis on virtex -7 FPGA for multiple read ports (a) 
BRAM utilization v/s memory depth and (b) fmax (MHz) v/s memory depth 


The performance analysis of 2W4R and 3W4R memory modules on Virtex-7 FPGA is illustrated in 
Figure 9. The chip area (Slices and LUTs), frequency, BRAMs, and throughput parameters are considered 
concerning different memory depths for performance analysis. The 2W4R memory module utilizes 1.15% of 
slices, 12% of LUT’s for 8K memory depth, 2.27% of slices, 29% of LUT’s for 16K memory depth on 
virtex-7 FPGA as shown in Figure 9(a). In contrast, the 3W4R memory module utilizes 1.165% of slices, 
20% of LUT’s for 8K memory depth, 2.28% of slices, and 36% of LUT’s for 16K memory depth Virtex-7 
FPGA as shown in Figure 9(b). The 2W4R memory module operates at 292.9 MHz for 8K and 267.3 MHz 
for 16K. In contrast, the 3W4R memory module operates at 270.1 MHz for 8K and 250.9 MHz for 16K 
memory depth on virtex-7 FPGA, as illustrated in Figure 9(c). 

The throughput is calculated using latency, frequency, and input data width parameters. The latency 
of 2.5 clock cycles is utilized for both 2W4R and 3W4R memory designs. The 2W4R memory module works 
at 3.74 Gbps for 8K and 3.42 Gbps for 16K. The 3W4R memory module operates at 3.45 Gbps for 8K and 
3.2 Gbps for 16K memory depth on Virtex-7 FPGA, as illustrated in Figure 9(d). The 2W4R memory module 
utilizes the BRAMs of 32 for 8K and 64 for 16K. In contrast, the 3W4R memory module uses the BRAMs of 
48 for 8K and 96 for 16K memory depth on Virtex-7 FPGA, as illustrated in Figure 9(e). The BRAMs 
utilization increases exponentially as the memory width increases for both the 2W4R and 3W4R memory 
modules. 

The design approach, FPGA device, resources utilization (Slices), Fmax, and BRAM parameters for 
different memory depths are considered for comparison. The comparison of the proposed 2W4R memory 
module with existing approaches on the same virtex-7 FPGA is tabulated in Table 2. The proposed 2W4R 
using the BDRT approach operates at a better operating frequency of 7.8% for 8K and 2.98% for 16K depth 
than the existing 2W4R using the XOR approach [30]. Similarly, The BRAMs utilization of 60% for 8K and 
60% for 16K depth than the existing 2W4R using XOR approach [30]. The proposed 2W4R using the BDRT 
approach utilizes less area overhead of 93.11% for 8K and 92.92% for 16K depth than the existing 2W4R 
using the LVT approach [4]. Similarly, The BRAMs utilization of 50% for 8K and 50% for 16K depth than 
the existing 2W4R using LVT approach [4]. The drastic area reduction of 94.33% for 8K and 95.71% for 
16K then the existing BDRT based 2W4R system [16]. Even the operating frequency of the proposed 2W4R 
modules works at 50.85% for 8K and 52.23% for 16K faster than the existing BDRT based 2W4R approach 
[16]. The bi-directional MPM is designed by [27] using a decision-making module (DMM) on Virtex-6 
FPGA. The proposed design of the 2W4R module utilizes less BRAM of 33.33% for 8K and 33.3% for 16K 
than the existing approach [27]. 

In contrast, the comparison of the proposed 3W4R memory module with existing approaches is 
tabulated in Table 3. The proposed 3W4R memory design provides better Fmax and BRAMs than the 
existing 3W4R based XOR approach [30]. Similarly, the proposed 3W4R memory design utilizes less area 
and BRAMs, operates at better frequency than the existing 3W4R based LVT approach [4]. The proposed 
3W4R memory design uses less area and works at better frequency than the existing 3W4R based LVT 
approach [4] on Virtex-7 FPGA. The 2W4R and 3W4R memory using BDRT [16] consume fewer BRAMs 
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than the proposed work, consuming more slices and LUTs than the proposed designs. The DMM-based 
3WA4AR module [27] utilizes BRAMs of 96 and 192 for 8K and 16K memory depths. The proposed 3W4R 
module utilizes less BRAM of 50% for 8K and 50% for 16K than the existing approach [27]. The MPM 
designs for different depths are analyzed. If the memory depth is increased, the slices, BRAMs utilization 
will increase by decreasing the operating frequency of the FPGA. Overall, the proposed 2W4R and 3W4R 
memory modules utilize less chip area, BRAMs, and provide better-operating Frequency on FPGA hardware 


than similar approaches. 
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Table 2. Comparison of proposed 2W4R memory module with existing approaches 


Desien works Aperach FPGA Slices (%) ag BRAMs Slices (%) BRAMs 
Ref [30] XOR Virtex-7 0.8 270 80 0.4 260 160 
Ref [4] LVT Virtex-7 16.7 149 64 32.1 116 128 
Ref [16] BDRT Virtex-7 20.3 144 30 53 128 60 
Ref [27] DMM Virtex-6 0.8 108 48 0.9 115 96 
Proposed BDRT Virtex-7 1.15 293 32 2.27 268 64 


Table 3. Comparison of proposed 3W4R memory module with existing approaches 


Desienserks ‘Apetbach FPGA Slices (%) a BRAMs Slices (%) ri BRAMs 
Ref [30] XOR Virtex-7 0.6 220 144 0.9 199 288 
Ref [4] LVT Virtex-7 35.4 111 96 66.4 86 192 
Ref [16] BDRT Virtex-7 38.4 121 40 71.2 94 80 
Ref [27] DMM Virtex-6 0.9 127.78 96 1.1 138.73 192 

Proposed BDRT Virtex-7 1.165 270 48 2.28 250 96 


4. CONCLUSION 

In this manuscript, efficient multi-ported memory modules are designed and implemented on FPGA. 
The proposed design MPMs offer cost-effective design solutions in terms of resources and performance than 
the existing MPMs approaches. The multiple read ports are designed using BDX and HBDX approach; The 
multiple write ports are created using the BDRT approach with remap table. The remap table is configured 
with a hash write controlling mechanism to avoid the write conflicts in the BDRT approach. The 2W4R and 
3W4R memory modules are implemented and analyzed the results. The multiple read port techniques using 
BDX and HBDX approaches are discussed with performance realization. The integration of BDRT with the 
HBDxX approach is used to construct the MPM modules. The proposed 2W4R and 3W4R utilize less chip 
area and BRAMs and operate better than the existing approaches. The 2W4R memory module uses 60% and 
50% fewer BRAMs than the existing XOR and LVT for 16K memory depth. The proposed 3W4R memory 
module utilizes 66.6% and 50% fewer BRAMs than the existing XOR and LVT approach for 16K memory 
depth. The proposed 2W4R and 3W4R memory modules operate at a better Frequency of 52.23% and 62.4%, 
respectively, than the existing BDRT approach for 16K memory depth. 
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